A martingale-transform goodness-of-fit test for the form of 

the conditional variance 



Holger Dette Benjamin Hetzler 

Ruhr-Universitat Bochum Ruhr-Universitat Bochum 

Fakultat fiir Mathematik Fakultat fiir Mathematik 

44780 Bochum, Germany 44780 Bochum, Germany 

e-mail: holger.dette@ruhr-uni-bochum.de email: benjamin.hetzler@ruhr-uni-bochum.de 

September 29, 2008 

Abstract 

In the common nonparametric regression model the problem of testing for a specific para- 
metric form of the variance function is considered. Recently Dette and Hetzler (2008) pro- 
posed a test statistic, which is based on an empirical process of pseudo residuals. The process 
converges weakly to a Gaussian process with a complicated covariance kernel depending on 
the data generating process. In the present paper we consider a standardized version of this 
process and propose a martingale transform to obtain asymptotically distribution free tests 
for the corresponding Kolmogorov-Smirnov and Cramer-von-Mises functionals. The finite 
sample properties of the proposed tests are investigated by means of a simulation study. 
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1 Introduction 

We consider the common nonparametric regression model 

(1-1) Yi,n = m(tj, n ) +cr{ti, n )e(t ijn ), i = l,...,n, 

where e^i, . . . , e n ^ n with e^ n := e(tj, n ) are assumed to form a triangular array of rowwise indepen- 
dent random variables with mean and variance 1 and m and a 2 denote the unknown regression 
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and variance function, respectively. In the regression model (11 .11) the quantities < ti >n < t 2 , n < 
■ ■ ■ < t n , n < 1 denote the explanatory variables satisfying 

(1.2) — — = ' f(t) dt, i = l,...,n, 

n + 1 Jo 

where / denotes a positive density on the interval [0, 1] [see Sacks and Ylvisaker (1970)]. Because 
additional information on the variance function such as homoscedasticity can improve the efficiency 
of the statistical inference, several authors have considered the problem of testing the hypothesis 

(1.3) H :a 2 (t) = a 2 (t,9); Vte[0,l], 

in the nonparametric regression model (jl.ip . where {cx 2 (-, 9) \ 9 G 0} is a given parametric class of 
variance functions and C M. d denotes a finite dimensional parameter space. Most authors consider 
linear regression models [see e.g. Bickel (1978), Breusch and Pagan (1979), Cook and Weisberg 
(1983) among others or Pagan and Pak (1993) for a review]. In the nonparametric regression model 
( II. ip there exist several papers discussing the problem of testing homoscedasticity [see Dette and 
Munk (1998), Zhu, Fujikoshi and Naito (2001), Dette (2002) or Liero (2003)]. Recently Dette, van 
Keilegom and Neumeyer (2007) proposed a test for the parametric hypothesis (ll.3p . which is based 
on the difference of two empirical processes of standardized nonparametric residuals under the null 
hypothesis and alternative. Weak convergence of the resulting process is shown and - because the 
limit distribution is complicated and depends on certain features of the data generating process - 
the consistency of a smoothed bootstrap procedure is established. Moreover, although the resulting 
test has nice theoretical and finite sample properties (in particular, it can detect local alternatives 
converging to the null hypothesis at a rate n -1 / 2 ) the approach requires rather strong assumptions 
regarding the differentiability of the variance and regression function. Dette and Hetzler (2008) 
suggested a procedure, which is, on the one hand, able to detect local alternatives at a rate n -1 ' 2 
and requires, on the other hand, minimal assumptions regarding the smoothness of the regression 
and variance function. These authors proposed to estimate the process 

(1.4) S t (w) = J (o 2 (x) - a 2 (x, 0*)) y/Mxjf(x) dx 

using pseudo residuals [see Gasser, Sroka and Jennen-Steinmetz (1986) or Hall, Kay and Titter- 
ington (1990)], where 

(1.5) 9* = argmin J (o 2 {x) — a 2 (x, 9) j f{x)dx 

is the parameter corresponding to the best approximation of the function a 2 by the parametric class 
{cr 2 (-, 9) | 9 £ 0} and w denotes a weight function [which was actually chosen as w = 1 by Dette 
and Hetzler (2008)]. Under very weak smoothness assumptions on the regression and variance 
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function they proved weak convergence of the estimated process, say (S t (w)) te [o,i]i to a Gaussian 
process. The Kolmogorov-Smirnov and Cramer-von-Mises statistic based on (St(w))te[o,i] were 
proposed for testing the hypothesis (11.31) . Because the covariance kernel of the limiting process 
depends on the data generating process in a complicated way, a bootstrap procedure was applied 
to obtain the critical values. 

It is the purpose of the present paper to construct an asymptotically distribution free test for the 
parametric form of the variance function which is on the one hand able to detect local alternatives 
converging to the null hypotheses at a rate n~ 1//2 and on the other hand requires minimal smooth- 
ness assumptions. For this purpose we consider a standardized version of the process discussed by 
Dette and Hetzler (2008), where the weight function is estimated from the data. We apply the 
martingale transform proposed by Khmaladze (1981, 1993) in order to obtain a distribution free 
limiting process. This transformation has been used successfully by several authors in goodness- 
of-fit testing problems for hypotheses regarding the regression function [see Stute, Thies and Zhu 
(1998), Khmaladze and Koul (2004) or Koul (2006) among others], but to our best knowledge, 
it has not been studied in the context of testing hypotheses regarding the variance function. In 
Section 2 we briefly review the main features of the empirical process proposed by Dette and 
Hetzler (2008) and introduce a standardized version of this process which will be the basis for 
our test statistic. In Section 3 and 4 we consider the martingale transform and show that the 
transformed (and standardized) empirical process is asymptotically distribution free. In Section 
5 we discuss several examples and investigate the finite sample properties of a Cramer-von-Mises 
test based on the martingale transformation, while some of the more technical details are deferred 
to an appendix. 

2 The basic process based on pseudo residuals 

We assume that the regression function m, the variance function a 2 in ( 11. ip . the design density / 
and the weight function w in (II. 4p are Lipschitz continuous of order 7 > § and that the moments 
of order 8 of the errors e^ n exist and are uniformly bounded. In general, the moments of order 
j > 3 of the errors may depend on the explanatory variables ti >n , that is 

mj{U, n ) =E [el J, j = 3, ...,8, 

and the functions and are also assumed to be Lipschitz continuous of order 7 > |. For the 
sake of a transparent presentation we consider at the moment linear hypotheses of the form 

d 

(2.1) H : o 2 it) = (*) > for a11 * e [°> !] > 

3=1 
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where Q\ , . . . , 9 4 G R are unknown parameters and o\ , . . . , <j\ are given linearly independent func- 
tions satisfying 

(2.2) <r?e LipJO.l], j = l,...,d. 

The general case of testing hypotheses of the form (II .3p will be briefly discussed at the end of 
this section. It is is shown in Dette and Hetzler (2008) that the process defined in (II. 41) can be 
consistently estimated by 

(2.3) S t (w) = B*{w) - Bj{w)A- l C, 

where the elements of the matrix A = (Sij)i<i j<d anc ^ ^he vector C = (ci, . . . , Cd) T are defined by 

1 n 

(2.4) d i:j = - of (t k ,n) o-j (t k>n ) , ~L<i,j<d, 

n fe=i 

1 n 

(2.5) C % = R ln °i (**,») » 1 < i < d > 

n — r z — ' 

fc=r+l 

respectively, 

( 2 -6) 4» ~ E W^V^^^n 

j=r+l 

and S t («;) = (^(tw), . . . , B?{w)) T with 

(2-T) £<M = - Ut jin <t}yf^) °l (t j>n ) , z=l,...,d. 

i=i 

In (12.51) and (I2.6p the quantities i^ )n denote pseudo residuals defined by 

r 

(2.8) i? i)n = ^ diYj^n, j = r + 1, . . . , n, 

i=0 

where the vector (do, • • • , d r ) T e iR r+1 satisfies 

r r 

(2.9) X> = > E^ = 1 



i=0 i=0 



and is called difference sequence of order r [see Gasser, Sroka and Jennen-Steinmetz (1986) or Hall, 
Kay and Titterington (1990) among others]. The following result was proved in Dette and Hetzler 
(2008) and provides the asymptotic properties of the process St(w) for an increasing sample size. 

Theorem 2.1. // the conditions stated at the beginning of this section are satisfied, then the 
process {\fn(S t {w) — St(w))} t e[o,i] converges weakly in D[0, 1] to a centered Gaussian process with 
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covariance kernel k(ti,t 2 ) given by the non-diagonal elements of the matrix V 2 Y> tl j 2 V 2 G M 2x2 , 
where the matrices Sf 1)ta G R( d+2 ) x ( d + 2 ) and V 2 G R 2x ( d+2 ) are defined by 







Ull 


U12 


iwu ■ 


■■ Wld^ 






U21 


^22 


u>21 ■ 


■■ w 2d 


(2.10) 






^21 


Zll • 


■■ Zld 








^2d 


z dl ■ 


■ • Zdd J 



BZ(w)A-i 



G 2 d (x) \Jw{x)f (x) dx 



(2.11) V 2 = (I 2 \U), U = - 

respectively. The vector Bj{w) is defined by 

(2.12) Bf(w) = (J o\ (x) ^w~{x)f (x) dx,... 
the elements of the matrix A = (ay)i<ij<d ore given by 

(2.13) a ij — / °~i( x )°~l( x )f( x )dx, 1 < i, j < d, 

Jo 

the elements of the matrix in A2.10\) are defined by 
,.-\ 

r r (s)a 4 (s) IpfyMj) (s) w(s)f (s) ds, 1 < i,j < 2, 
T r {s)cr A (s) a 2 (s) l [0 , ti) (s) v 7 ^)/ 0) ds, 1 < i < 2, 1 < j < d, 

o u 

%= / r r (s)cr 4 (s)(T 2 (s)(T 2 (s)/(s) ds, 1 < i,j < d 
Jo 

with T r (s) = m 4 (s) — 1 + 4o~ r , and iae quantity S r is given by 

r r—m 

(2.14) 8 r = J2(J2 d ^ 



Vij = 



m=l j=0 



Note that the null hypothesis (12.11) (or more generally the hypothesis (11.31) ) is equivalent to S t (w) 
Vt G [0,1], and consequently rejecting (12.11) for large values of the Kolmogorov-Smirnov or 
Cramer-von-Mises statistic 

K n = Vn~ sup \S t (w)\ , G n = n [ \S t (w)\ 2 dF n (t) 
te[o,i] Jo 
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yields a consistent test. Here F n (t) = - Yli=i ^{t in <t} is the empirical distribution function of the 
design points. Moreover, it is demonstrated by Dette and Hetzler (2008) that this test can detect 
local alternatives which converge to the null hypothesis with a rate n" 1 / 2 . Because the limiting 
distribution depends on certain features of the data generating process, these authors proposed a 
bootstrap procedure to calculate the critical values. 

If (A (t, u>)) ig j -n denotes the limiting process in Theorem 2.1 it follows from the Continuous Map- 
ping Theorem [see Pollard (1984)] that 

K r A sup \A(t,w)\, G r A [ \A(t,w)\ 2 dF(t), 
te[o,i] Jo 

where F denotes the distribution function of the design points. Using the Lipschitz continuity of 
the regression and variance function, it was shown in the proof of Theorem 2.1 that the process 
A n (t,w) = y/n(Sf(w) — S t (w)) exhibits the same asymptotic behaviour as the process 

(2.15) A n (t,w) = C n (t,w)-D n (t,w), 
where 

j — n 

(2.16) C n (t,w) = ^2 1 {*;,«<*} \J w {U,n)Zi, n , 

i=r+l 

i— n 

(2.17) D n (t,w) = Bj{w)A~ l (^- V Z l)n o)(ti, 

\n — r L — 4 J 



i=r+l 



d 

i 

3=1 



the vector Bj{w) = (B]{w), . . . , Bf(w)) and the matrix A = {ciij)i<i,j<d are defined in (I2.12p and 
f)2.13p . respectively, and the random variables Z iin are given by Z^ n = Lf n — E[L^ n ], with 

r 

(2.18) Li tH = ^ dj&{tj—j t n)£i-j,n- 

3=0 

Because {Zi iTl \ i — 1, . . . ,n, n G N} is a triangular array of r-dependent random variables, it 
follows observing 

r 

(2.19) E [Z 2 j!n ] + 2 E [Z hn Z j+m A = (m A (t j>n ) - 1 + 4<5 r ) a 4 (t j>n ) + O (n^) 

m=l 

[see Dette and Hetzler (2008)] that the process {C n (t, w)}te[o,i] converges weakly in D[0,1] to the 
process W o where W denotes a Brownian motion and the function ip is defined by 

(2.20) ip(t)= [ f3(x)w(x)f(x)dx 

Jo 

with 

(3{x) = (m 4 (x) - 1 +A5 r )a 4 (x). 
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Note that the transformation if> depends on the unknown function (3 which is not known, because 
it contains the variance and the fourth moments of the innovations e^ n . In the following we will 
use the specific weight function w(x) = 1/ (3{x) for which the function if) reduces to if){t) = F(t) = 
f Q f{x)dx and the process {C n (t, 1/ /?)}te[o,i] converges weakly to a Brownian motion W o F. We 
assume in a first step that the function f3 is known and investigate the martingale transformation 
of the standardized process 

(2-21) A° n (t) = C° n (t) - D° n (t) , 

where A° n (t) = A n (t, 1/(3), 

(2-22) C° n (t) = C n (t,l//3) = ^ E hu,n<t}Zi,nP- 1/2 (hn), 

i=r+l 

(2.23) D° n (t) = D n (t,l/0) = Bj(l/(3)A- 1 ^- £ Z hn g(t hn ) 

i=r+l 

and g(x) = (erf (x), . . . , a^(x)) T . In a second step we will estimate the function f3 nonparametrically 
and consider the corresponding processes standardized by this estimate. More precisely, we will 
show that the corresponding martingale transform of the process 



;2.24) A n (t) = v^(A(l//3) - St(l//?)) 



leads to an asymptotically distribution free test, where f3 is an appropriate estimate of the function 

(3. 

Remark 2.2. For the problem of testing a general nonlinear hypothesis of the form (jl.3p we 
propose to consider the process 

- n 

S t (w) = 4» - - E Mt^ty^ikn, 0)y/w(kn), 

i=l 

where 

\ n 2 

§ = arg mm [Rln ~ ° 2 (hn, 0)) 

flee n — r L — ' V ' / 

i=r+l 

is the least squares estimate of the parameter 9* defined by (11.51) . In this case it was shown by Dette 
and Hetzler (2008) that under assumptions of regularity the process {^ l /n(St(w) — S t (w))} t £[o^] 
exhibits the same asymptotic behaviour as described in Theorem 2.1 for the linear case, where the 
functions a 2 have to be replaced by 



j = 1, . . . ,d. 



Thus all results presented in the following section can be transferred to the nonlinear case using 
this identification. 



7 



3 The martingale transform of the process A® n 

It follows by similar arguments as given in Dette and Hetzler (2008) that the process {A°(t)} te [ 0i i] 
defined by (12.21 j) converges weakly in D[0, 1], that is 

(3.1) A° n Z W o F - Bj{l/P)A~ l V Q = Rl, 

where W is a Brownian motion and Vq denotes a centered normal random variable with mean 
and covariance matrix 

L= / g(x)g T (x)(3(x)f(x)dx. 



o 



Because the distribution of the process is complicated, we consider in the following section 
an operator, which transforms the process R^ on the martingale part in its corresponding Doob- 
Meyer decomposition. Following Khmaladze and Koul (2004) we define a linear operator T such 
that 



(3.2) TRl £ R° 

(3.3) T{Bj{l/P)A- l V,) = 



OO 1 



where the symbol = denotes equality in distribution and the process R^ is given by R^ = W o F. 
For this purpose we consider the matrix 

(3.4) Hit) = £ P~\u)g(u)g T (u)f(u) du 
and define for a function rj its transformation Trj by 

(3.5) (T V )(t) = V (t) - f p- 1 / 2 (y)g T (y)H-\y) f ^/ 2 (z)g(z) V (dz)F(dy), 



where only functions are considered such that the integral on the right hand side of (13. 5ft exists. 
Note that the matrix H(x) is non-singular for all x G [0, 1) because the functions af, . . . , a 2 d are 
linearly independent; see Achieser (1956). If 77 is a stochastic process on the interval [0,1], the corre- 
sponding integral in (I3.5P is interpreted as an Ito- integral [see 0ksendal (2003)]. A straightforward 
calculation shows that 

T(B?(1/P)A- V ) = , 
CovCTi^r), TR^is)) = F(r A s) , 

which yields for the process defined on the right hand side of (13. ip 

(3.6) TRl^TRl^Rl^WoF 
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(note that i?,^, is a Gaussian process and that the operator T is linear). The following theorem 
shows that a similar property holds in an asymptotic sense for the process {^°(i)}te[o,i]- 

Theorem 3.1. If the assumptions stated in Section 2 are satisfied, then the transformed process 
{TA^ (£)}te[o,i] converges weakly in D[0, 1] to a Brownian motion in time F, that is 

{TA° n (t)} tem ^{WoF(t)} tem . 

Proof. The assertion of the theorem follows from the statements 

(3.7) TA° n = TCl 

(3-8) {TC° n (t)} tem Z {WoF(t)} tem . 

For a proof of (13. 7\i we recall the notation D® = A^ — C® in f!2.23|) and obtain by a straightforward 
calculation from the definitions (13.51) and (12.231) 

TD° n (t) = TC° n (t)-TA° n (t) 

= D° n (t)- f p- 1 / 2 (y)g T (y)H-\y) f ^ 1 {z)g{z)g T {z)F{dz)F{dy)A- 1 



( V 2 \ ry , 

X > J Zk,n g{tk,-. 

\n — r z — ' 

k=r+l 

/•t I — n 

= D° n (t)- / (3-V\y)g T {y)H-\y)H{y)F{dy)A- 1 (^- £ Z k , n g(t k , n )) 
Jo \n — r u __ ti 

j— n 

(3.9) = D° n (t) - Bj{\/p)A- x (^- Z *,n g(U 

\n — r z — 4 



k=r+l 

n 

0. 



k=r+l 

The process TC® is a sum of r-dependent random variables. Therefore, weak convergence of the 
finite dimensional distributions and tightness can be shown using similar arguments as in Dette and 
Hetzler (2008). Thus the assertion follows showing that the covariance kernel of the limiting process 
is given by F(s A t). For the calculation of the asymptotic covariances we use the representation 

t— n 

(3-10) TC° n (t) = - VIL £ 

n — r z — ' 

i=r+l 

where 



Q, n (t) = l {titn < t} Z i>n f3- 1/2 (h n ) - I r 1/2 (y)/(y)^ 1 (y)l { t 1 ,„> 2/ }^(^,n)^,„/5- 1 (t li n)F(^) 

(3.H) = Cg(t)-C7g(t) 
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and the last line defines the random variables Cf^ (t) and Cf^ (t) in an obvious manner. Observing 
that 

r 

(3.12) E[Z?J + 2 E E[Z i>n Z i+m , n ] = /3(ti,n) + 0(n~ 7 ) 

m=l 

[see (I2TT9D or Dette and Hetzler (2008)], it follows for r < s 

Ep#(r)cg(*)] + 2 £ E[^(r)C« min ( S )] 

m=l 

r 

= l{t, i „<r}fi[^r 1 (i ! ,n) + 2 hu,»<r} E[2'i,n^+m,«]^" 1 (*< I n) + °(1) = !{*,«<,-} + o(l) . 

m=l 

This implies 

n— r r 

E E[ctS(r)cg( a )] + 2 £ E[cS(r)C« mirt ( S )] = F(r) + o(l) , 



(n — r) 2 



i=r+l m=l 

and similar arguments show 



n 



E ( E[Cg(r)Cg( a )] + 2 E E[^(r)c£L,n(«)] 



(n — r) 2 

v ' i=r+l m=l 







r^ivV^H-^y) / r 1/2 (x)^(x)F(rfx)F(^) + o(l), 



n 



E ( E[Cg(a)Cg(r)] + 2 E E[^( S )^ m , n (r)] 



(n — rV 

v i=r+l m=l 



n 



p-^ivVtoH-^v) / /3- 1/2 (x)^(x)F(rfx)F(^) + o(l) ) 

n— r r 

E (E[cfi(r)cS( S )] +2EE[CSWC£U(^ N 



(n — r) 2 

v ' i=r+l m=l 

Jo Jo 

A combination of these results and an application of Fubini's theorem yield 

7i— r r 

E[TC° n (r)TC° n (s)} = —^— 2 E (E[Ci, n (r)C^(a)) + 2 £ E[Mr)£?^)]) + o(l) 

^ ' i=r+l m=l 

= F(r)+ f p- 1/2 (y)g T (y)H-\y) f ^ l '\x)g{x)F{dx)F{dy) 

JO Jy 

+ f (3- 1/2 {y)g T {y)H-\y) f (3- 1 l 2 {x)g{x)F{dx)F{dy) 

Jo Jy 
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+ f f f3- 1 / 2 (y 1 )g T (y 1 )H- 1 (y l )H(y 1 Vy 2 )H- 1 (y 2 ) 
Jo Jo 

x(3- 1 / 2 (y 2 )g(y 2 )F(dy 2 )F(dy 1 ) + o(l) , 
= F(r) + o(l), 

which implies the assertion of the theorem. □ 

4 The martingale transform of the process {A n (t)} te ^^ 

As pointed out in Section 2, the process {y/n(S t (l / (3) — S t (l / /3))} t e[o,i] (or its asymptotically 
equivalent counterpart {^°(0}t6[o,i]) depends on the unknown function (3 (more precisely on the 
(unknown) functions cr 2 (-) and m^-)). Similarly, the operator T defined by ( 13. 5j) is not completely 
known and has to be estimated from the data. In this section we propose an empirical process, 
where the unknown quantities have been replaced by estimates and study the application of an 
empirical version of the martingale transform. For this purpose we first have to specify the estimate 
in the process {A n (t)} t6 [ 0j i] defined in (12. 24ft . We consider the Nadaraya- Watson weights 



(4.1) = T — / — x , i,j = l,...,n, 

at the points ti iVi (i = l,...,n) where K denotes a symmetric kernel function and h defines a 
bandwidth converging to with increasing sample size. The estimate of the function /?(•) is now 
defined by 



n—r—l 

(4.2) +(45 r - 1) 22 Wij(Y jjn - m h (t jjn )) 2 (Y j+r+lin - m h (t j+r+ i tn )) 2 , 

3=1 

where rhh{t i)n ) = YTj=i w ijYj,n denotes the Nadaraya- Wat son estimate at the point ti >n (i = 
1, . . . , n). Throughout this paper we assume that 

(H) The bandwidth h satisfies h = h n = 0{n~ 2 ~* +1 ), where j > | denotes the Lipschitz constant 
defined in Section 2. 

(K) The kernel K is symmetric, nonnegative, supported on the interval [—1, 1] and satisfies 
K{u) < 1 for all u G [—1, 1] and K(u) > k for all \u\ < 1/2, where k > 0. 

It will be proved in the appendix that under these additional assumptions 

1 n 

( 4 -3) SUp — Mti, n <t}Zi, n 0{hn) - P(ti,n)} = 0„(1) , 



te[o,i] 



n 
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and similar arguments as given in Dette and Hetzler (2008) show that 
(4.4) A n (t) = (&(l/£) - S t {l/0)) = Al(t) + o p (1) 

uniformly with respect to t G [0,1]. In this representation, {A^(t)} t6 [ 01 ] denotes the process 
obtained from {^4°(£)}te[o,i] by replacing (3{t) by its estimate (3{t) defined in (14. 2[) and the vector 
B t {\/ (3) and the matrix A by their estimates B t (l//3) and A defined in (12.41) and (12.71) . that is 

(4-5) Al(t) = C l n {t) - Dl(t), 

where 

i— n 

(4-6) Cl(t) = ^ E Mu, n <t}Z,J- 1/2 (t hn ) , 

i=r+l 



n — r 

«=r+l 



Similarly, we replace the operator T by its empirical version defined by 

(4.7) (T nV )(t) = V (t) - f ^{y)g T { y )H-\y) t ^ 1 l 2 {z)g{z)g{dz)F n {dy), 



o 



where the matrix H n (x) is given by 

(4.8) #„(x) = / /3" 1 ( M )(7(u)( ? T (u)F n (^) = - y2hu n > x }P~\U,n)g(k n )g T (t^ n ) 

andF n (t) = iEI =i ^{u, n <t} denotes the empirical distribution function of the design points. 
Note that the matrix H(x) used in the transformation (13.51) is singular at the point x — 1, and as a 
consequence, the matrices H~ l (x) are unbounded on the whole interval [0, 1]. To circumvent this 
difficulty, we restrict the process T n A\ to the interval [0, to] with a fixed < to < 1- This approach 
was also suggested by Khmaladze (1993) and Stute, Thies and Zhu (1998) among others. 
The following results show that the asymptotic properties of the processes \TA\ (t)}te[o,t ] an d 
{T n y4^(i)} tg [o jto ] coincide, and as a consequence we obtain weak convergence of the martingale 
transform of the process defined on the left hand side of (14.41) . 

Theorem 4.1. If the assumptions stated at the beginning of Section 2 and the assumptions (H) 
and (K) are satisfied, then for any < to < 1 the process {T n A\{t)} t ^ tQ \ converges weakly on 
_D[0,to] to a Brownian motion in time F, that is 



{T n Al(t)} moM °{WoF(t)} te[OM 
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Corollary 4.2. If the assumptions of Theorem 4-1 o,re satisfied, then for any < t < 1 the 

process {T n A n (t)}t^[o,t ] converges weakly on D[0,to] to a Brownian motion in time F, that is 

{T n A n (t)} te[0M Z {WoF(t)} te[0M . 



Proof of Theorem 4.1. Obviously the assertion follows from the statement 

(4.9) sup \TA° n (t)-T n A 1 n (t)\=o p (l). 

te[o,t ] 

In order to prove the estimate (14.91) we note that (using the notation D\ = A\ — C*) 

= Dl(t)- f p- 1/2 (y)g T (y)H-\y) f ^(z)g(z)D 1 n (dz)F n (dy) 

J0 Jy 
ft I — n 

= Dl{t) - / $~ 1/2 (y)g T (y)F n (dy)A~ 1 (^^- £ Z ijn g{t i>n )) 

J U ~ r i= r +l 

i— n 

= D)Xt) - B^l/faA- 1 ^- V Z hn g(t hn ) = 0. 



i=r+l 

Consequently (observing the corresponding result for TA^ — TC® in ( 13.91) ) . the assertion follows if 
the statement 

(4.10) sup \TC° n (t)-T n Cl(t)\=o p (l) 

te[o,t ] 

can be proved, where 

TC° n (t) = C° n (t)- f p- 1/2 (y)g T (y)H- 1 (y) f p-^(z)g(z)C^dz)F(dy) = C° n (t) - B° n (t), 

J0 Jy 

TnClit) = C l n (t)- f p- 1/2 (y)g T (y)H- 1 (y) f ^{z)g{z)C l n {dz)F n {dy) = C l n (t) - B l n (t), 



JO Jy 

C° and C\ are defined in (I2.22p and (14. 6p . respectively, and the equalities define the processes B® 
and B\ in an obvious manner. It follows by a Taylor expansion, by the estimate H4.3f) and the 
estimate 

(4.11) sup \m h (t) - m(t)\ = O p (n'^+^^/hgn) 

te[o,t ] ^ ' 

[see Mack and Silverman (1982)] that 

(4.12) sup \C 1 n (t)-C° n (t)\=o p (l). 

te{o,t ] 
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We now consider the remaining difference 

B^t) - B° n (t) = f ^ 1/2 (y)g T (y)H^(y)U n (y)F n (dy)- f ^ l/2 (y)g T (y)H-\y)U n (y)F(dy) 
Jo Jo 

= f 0~ 1/2 (y) - f3- 1/2 (y))9 T (y)K\y) - H-\y)){U n (y) - U n (y))F n (dy) 
Jo 

+ f p- 1/2 (y)g T (y)(H~\y) - H-\y)){U n {y) - U n {y))F n {dy) 
Jo 

+ f 0- 1/2 (y) - (3- 1/2 (y))g T (y)H-\y)(u n (y) - u n ( y ))F n (d y ) 

Jo 

+ f r 1/2 (y)g T (y)H-\y)(u n (y) -u n (y))F n (dy) 

Jo 

- f 0~ l/2 (y) - (3- l/2 (y))g T (y)(H^(y) - H-\ y ))u n { y )F n {d y ) 

Jo 

+ f f3- l/2 (y)g T (y)(H-\y) - H-\y))U n {y)F n {dy) 
Jo 

+ f 0~ 1/2 (y) - p- l/2 (y))g T (y)H-\y)U n (y)F n (dy) 



t 

1/2/ 



+ / (3- ir2 (y)g T (y)H- 1 (y)U n (y)F n (dy) - [ ^ 2 (y)g T (y)H- 1 (y)U n (y)F(dy) 
Jo Jo 
(4.13) = T n>1 (t) + ... + T nJ {t) 

+ f p- l/2 {y)g T {y)H-\y)U n {y)F n {dy) - f P~ l/2 {y)g T {y)H-\y)U n {y)F{dy), 
Jo Jo 

where the last equality defines the terms T n> i(t), . . . ,T n j(t) in an obvious manner and we have 
used the notation 

Un(y) = f p- 1/2 (z)g(z)C l n (dz) , U n (y) = f p- l / 2 (z)g(z)C» n {dz) . 
Jy Jy 

The nine terms in this expression are estimated separately. Because the proceeding for the first 
seven terms is similar, we exemplarily illustrate the arguments for T„ j6 (t). This term is bounded 
by 

sup \\H-\y)-H-\y)\\T nl , 
ye[o,t ] 

where 

Tm ■= f \f3- 1/2 (y)\\\g T (y)\\\\Un(y)\\F n (dy) 

Jo 

1 n r / — n 



< Tnll 

n 



i=i 



n — r 

j=r+l 
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I • || denotes the euclidean norm on M. d and its induced matrix norm on M. dxd simultaneously, 
and we have used the definition of U n (y). A straightforward application of the Cauchy-Schwarz 
inequality shows 

1 n / — n 1/2 

ET nll < -5>- 1/2 &,n)| \\g T {ti,n)\\(n— £ k^>t,Mhn)z hn r\h n )f) =0(1) 

n z — ' V n — r z — ' / 

i=l j'=r+l 

uniformly with respect to t 6 [0, to]) which implies T n \ = O p (l) uniformly on the interval [0, to]. 
Using similar arguments as in the proof of the estimate (14.31) it can be shown that 



max |/3(tj 

i=l,...,n 



-P(ti,n)\ =Op(l). 

By a Taylor expansion and the assumption (11.21) on the design it follows that 

(4.14) sup \\H- 1 (y)-H- 1 (y)\\ = o(l), 

2/e[o,t ] 

and we obtain that T n 6 (t) is of order o p (l) uniformly with respect to t G [0,t ]. Using similar 
arguments it follows that T n> i(t), . . . , T ni5 (t) and T n j(t) are also of order o p (l) uniformly in t e 
[0, to] • For the difference of the last two terms on the right-hand side of (14.131) we show the estimate 

ft 



(4.15) 



sup 



(3- 1,2 {y)g T {y)H-\y)U n {y)(F n {dy) - F{dy)\ = o p (l) 



using Lemma 6.6.4 in Koul (2002). Note that for the application of this result one has to show 
the tightness of the process {U n (x)} xe [o )to ]. For this purpose we consider the components of U n 
separately, that is 



x 



~Z~ E hti.n^x}^ (ti,n) fi~ X (ti,n) Z l 



n — r 



P 



,d, 



i=r+l 



and introduce the notation 

M X ) = 1 {yi<x<y2}^l(x)(3-\x). 

Now a similar calculation as in Dette and Hetzler (2008) yields 

2 n 4 

E[( J2 U p(hn)Zi,n) 1 <C{y 2 - yi f 



E[(U^(y 2 )-U^( yi )Y 



n 



[n — r) 



i=r+l 



for some constant C > and < y± < y 2 < to- This implies tightness of each component Un"' [see 
Billingsley (1999)] and as a consequence tightness of the process U n [see Billingsley (1979)]. □ 

Remark 4.3. Theorem 4.1 and Corollary 4.2 remain correct if the Nadaraya-Watson weights 
in the estimate j3 defined in (14.21) are replaced by local linear weights. This follows by a careful 
inspection of the proof of the estimate fl4.3[) in the appendix. In practical applications the use of 
local linear weights is strictly recommended because of the better performance of the local linear 
estimate at the boundary of the design space. 
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5 Finite sample properties 



In this section we investigate the finite sample properties of the new test by means of a simulation 
study. We have generated data according to the model 

(5-1) Yi,n = 1 + ti >n + <j{ti : n)£i,m i = 1, • • • , Tl, 

where t i)n — i/(n + 1), i — 1, . . . , n, and simulated the power of the test for the hypothesis 

(5.2) H : a 2 (t) = 1 + 6t 2 
and the variance functions 

(5.3) a 2 (t) = 0.5 + 3t 2 + 2.5c sin(27rt), 

(5.4) a 2 (t) = 0.5 + 3t 2 + 2ce 2t , 

(5.5) a 2 (t) = 0.5 + 3t 2 + 4cy/t. 

Note that the choice c = in (15. 3p - (15.51) corresponds to the null hypothesis of a quadratic 
variance function. The errors £j >n are standard normal distributed and we use a difference sequence 
of order r = 1 for the calculation of the pseudo residuals Ri >n , which determines the weights as 
d = —d\ = l/V% and yields (3(x) = m i {x)a A {x). In order to apply the test we have to calculate 
the transformation 

T n A n (t) = r n (Vn(3 t (l/$ - S t {l//3))) 

for the process A n (t) given in (14.41) . Under the null hypothesis (15.21) we have S t (l/f3) = for all 
t G [0,1], and the process A n (t) can be written as 

A n (t) = yfiLS t (l/$) = C n (t) - D n (t) 

with 



i=r+l 

r~ - 

D n (t) = B?(l/$)A- l -H- R ln9(hn). 



i=r+l 



By a similar argument as given in the proof of Theorem 4.1 it can be shown that T n D n (t) = for 
all t G [0, 1], and as a consequence it is sufficient to calculate the transformation T n C n . We use the 
Cramer- von-Mises statistic G n = (T n A n ) 2 '{t)dF n {t) , and from Corollary 4.2 and the Continuous 
Mapping Theorem it follows that 

(5.6) G n = [ (T n A n ) 2 (t) dF n (t) Z I W 2 (F(t)) dF(t) = I W 2 (t) dt, 
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where W denotes a standard Brownian motion. If w a denotes the 1 — a quantile of the distribution 
of the random variable W 2 {t)dt } then the test, which rejects the null hypothesis (15.21) if 

(5.7) G n > w a 

has asymptotically level a and is consistent against local alternatives converging to the null hy- 
pothesis at a rate n^ 1 ^ 2 . As an estimator of the function (3(x) = m4 i {x)a i {x) we use the estimator 
(14.2j) . where rhh{-) is the local linear estimator of the regression function. The bandwidth for the 
calculation of the local linear estimate was determined by least squares cross validation. If hov 
is the bandwidth obtained by this procedure, the bandwidth in the estimator (14. 2 p was chosen as 
hcv/2. 

1000 simulation runs were performed in each scenario to calculate the rejection probabilities, which 
are shown in Table 5.1. For the sake of comparison, the table also contains the corresponding 
rejection probabilities of the bootstrap test proposed by Dette and Hetzler (2008), which are 
displayed in brackets. If the null hypothesis is satisfied (c = 0), we observe a rather precise 
approximation of the nominal level in all cases, even for the sample size n = 50. Under the 
alternatives the behaviour of the two tests is different. In model (15. 3ft the bootstrap test yields 
a substantially larger power than the test based on the martingale transformation, in particular 
for the sample size n = 50 or n = 100. In model (15. 5p the situation is similar for the sample 
size n = 50, but the differences between the rejection probabilities of the two steps are smaller. 
Moreover, for the sample size n = 200, the test based on the martingale transform shows a better 
performance. In model (15 .4p the bootstrap test is more powerful for the sample size n = 50, while 
for the sample sizes n = 100 and n = 200 the test based on the martingale transformation always 
yields a larger power than the bootstrap test. 
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n = 50 


n = 100 


n = 200 




c 


.025 


.05 


.10 


.025 


.05 
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.025 


.05 
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U 


.027 
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.020 
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(.024) 
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.yyy 
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.045 


.090 


.026 


.049 


.086 


.023 


.047 
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(.049) 


(.093) 


(.034) 


(.059) 


(.106) 


(.023) 


(.046) 


(.086) 




0.5 


.oy z 


.1 zo 


.292 


.246 


.359 


.509 


.52U 


a A a 
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. / oo 








{.1*04) 
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f' 9fifi"\ 
^.ZOO ) 


Z" "ZAf{\ 


( A^7\ 
{.4i)() 
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.14:1 
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.339 


A A *7 
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.61U 


TA £ 

. /45 


O A 7 

.o4 z 
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.yob 








^.OOU J 


^.420J 


z' "\ 


{.4L3) 


{.0Z4) 




z' fi^zH 


( 7A7\ 







.029 


.056 


.107 


.027 


.047 


.087 


.027 


.048 


.094 






(.019) 


(.041) 


(.097) 


(.025) 


(.044) 


(.098) 


(.023) 


(.047) 


(.102) 


([53]) 


0.5 


.088 


.159 


.260 


.239 


.355 


.473 


.517 


.646 


.765 






(.231) 


(.303) 


(.400) 


(.327) 


(.429) 


(.556) 


(.537) 


(.637) 


(.734) 




1 


.162 


.273 


.414 


.386 


.532 


.685 


.825 


.910 


.957 






(.381) 


(.484) 


(.586) 


(.565) 


(.667) 


(.756) 


(.775) 


(.847) 


(.911) 



Table 5.1. Rejection probabilities of the Cramer-von-Mises test [ 5.1 ) for the hypothesis $5.21) in 
the regression model Ii5.1\) . The corresponding rejection probabilities of the bootstrap test proposed 
by Dette and Hetzler (2008) are displayed in brackets. 
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6 Appendix: Proof of (14.31) 



Throughout this section we omit the index n; in particular we write tj and Zj instead of tj >n and 
Zj >n , respectively. For the sake of brevity we only indicate the main steps of the proof, details can 
be found in Hetzler (2008). Furthermore we restrict ourselves to the case a = 1 and r = 1 and 
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note that the general case is proved exactly in the same way with some additional notation. This 
simplification yields for the random variables Zi 

Zi = d a(ti)ei + dicr(ij_i)£j_i = E% . 

A straightforward calculation gives 

-, n 5 

(6.1) A(t) = -^hu^iWi) -P(m = 52Mt), 

V i=l 7=1 



where 



Y n n 

V 1=1 .7 = 1 



6 



3= 

n n 
3= 

n n 



A 2 (t) = -^J2i {ti < t} z i j2 w i3 £, j( m ( t j) -™h(tj)), 

v n i=1 ,=i 



M*) = ^^Uu^tyZi^Wije^mit^-mhitj)) 2 , 
v i=i j=i 

4 ra n 

v n i=i i= i 
Y n n 

4 = 1 J = l 

We rewrite to(^) — rhh(tj) = pj — Sfc=i "^jfc^fc with 

n n 

(6.2) := m(tj) - ^ w jk m(t k ) = ^ w jk(m(tj) - m(t k )) 

k=l k=l 

and first consider the term A ± . For its expectation we have 

Y n n 
E Ai(t) = — = Mu<t} E Z ; ^ iv, 

V 1=1 jr' = l 

where we used the notation hij := — m±(tj). Note that |E/iy| = \m±(tj) — m±(tj)\ < L/1 7 
whenever \tj — U\ < h (recall the Holder continuity for the function 777,4) and that it follows from 
the assumption on the design and the kernel 

(6.3) K h {yU) ^ ^ ^ K h ( tj - U) 



C 2 n ~ 3 ~ Kdn/2 
where K h (x) = K(x/h)/h and C\ and C 2 denote positive constants. This yields 



EjZi^Wijhij] = E[Zi(wiihn + w^i-xh^-x)] = O J 
7=1 ^ n ' 
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uniformly with respect to i — r + 1, . . . , n, and it follows that EAi(t) = 0(—^) = o(l). For the 
estimation of the second moment we decompose A\(t) as follows 



A\{t) = D 1 (t) + D 2 (t) + D 3 (t), 



with 



n i=l j=l 

2 n—l n n 

D2{t) = — 22 ^{ti<t}^{t i+1 <t}ZiZ i+ i 2J Wijhij 2J w i+ i tk h i+ i^ 

i=l j=l k=l 

^ n n 

D 3{t) ^ l{t l <t}l{t t <t}Z i Z l ^ Wjjhjj ^ wikh 



Hk- 

n * — ' * — ' * — ' 

|i-/|>2 j=l fc=l 

Observing (16.31) it follows for the set Ai := {i — l,i} 

n 

E[Z 2 WijWikhijhik] = E[Z 2 ^2 WjjWikhjjhjk] + 2E[Z 2 (w ii h ii + ^ w^fc] 

+ E[^ 2 (wii/iii + Wi,i-i/ii,i-i) 2 ] 
= E[Z 2 ^ WijWikhijhik] + O (n -1 ^ 1 ) + O (rT 2 h~ 7 ) 

= 0(/i 27 ) + O (n-'h- 1 ) + O (n' 2 h- 2 ) 
= 0(/i 27 ). 

A similar calculation shows ED 2 {t) = 0{h 21 ). For the remaining estimate for the term D 3 (t) we 
consider the set A^i — {i — 1, i, I — 1, /} and obtain 

n n 

EiZiZt^Wijhij^wikhtk] = E[ZiZi ^ WijWi k hijh lk \ 

+ EiZiZ^wuhii + w^ihi^i + wuhii + ^ u> ifc /i ifc ] 

+ E[^i^z 2J WijWi k hijhik\. 

j,k£Ai,i 

Note that the random variables Zj and Zi are independent whenever |Z — i\ > 2 and consequently 
the first three terms in the above expression vanish. The remaining fourth term can be decomposed 
in a sum of 16, which are all of the form 

1 



E[ZiZiWiiWiihuhii] = O 
20 



n 2 h 2 



This yields 

and as a consequence EAf(t) = O (-ry) = o(l). Thus we obtain 
(6.4) A x {t) = o p (l) 

uniformly with respect to t G [0, to]- I n order to derive a corresponding estimate for the term A 2 
we use the decomposition 



with 



A 2 {t) = A 2l {t) - A 22 {t) 



4 



v i=i i=i 

^ n n n 

A 22 (t) = ^E^^E^IE™^- 

* i=l 7=1 k=l 



Now the Holder continuity of the regression function implies \pj\ = \ 'Y^k=i w jk{ m {tj) ~ m {tk))\ — 
Lh 1 for some positive constant L and a straightforward calculation shows (note that the random 
variables Z± depend only on and 



n 



.'hi- 1 

E[Z l J2w i3 e 3 J p J ] = 

3=1 

which implies 

(6.5) E [A 21 (t)] = O n> ' 



By a similar calculation it follows that E[^22(t)] — O yj^j an d a combination of this estimate 
with (16.5p gives 

(6.6) E^M^O^). 

The estimation of the second moments of A 2 i(t) and A 22 (t) is more complicated and we indicate 
the calculations for the term A 2 i(t), which can be decomposed as 

A 2 2l (t) = B 1 (t)+B 2 (t)+B 3 (t), 

where 

n n 

i=l i=l 
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g2 n—l n n 

B 2 {t) = —^2l {u <t}l{ U+1 <t}ZiZ l+1 ^2w ij e 3 j p j ^2w i+lt iefpi, 

i=l j=l 1=1 

16 n n 

B 3 (t) = — ^ 1 {ti<t} 1 {tl<t} Z i Z l / ] W ij £ ]pj /] Wlr^yPr- 
|I-i|>2 3=1 r=l 



Using the estimates ( 16.31) we obtain 
16 



it) r i 16 r 

E[B x {t)\ = -J2ElUt i <t}Z?J2 w ij £ yi\ + - 

i=l j=l i=l jjtl 

= o(^—] + 0{h 2 ~<) = 0{h 2 ^). 



n 

A similar calculation shows EB 2 {t) = 0{h 21 ) and 

E[B 3 (t)} = O 1 



n 



which implies 

(6.7) E[A 2 21 (t)] = 0(h 2 ^). 

Similarly we obtain 

EA 2 22 (t) = 

and a combination with (16. 7\i gives 



1 

nh 2 J ' 



EA 2 2 (t) = 0^=0(n^)=a(l). 



On the other hand we have from (16.61) the estimate EA 2 {t) = O {j^j^j — O ^n^+ 2 \ = o(l) and it 
follows that 

(6.8) A 2 (t) = o p (l) 

uniformly on the interval t G [0, to]- The term A 3 (t) can be treated by similar arguments, which 
are omitted for the sake of brevity [see Hetzler (2008) for more details]. Tedious calculations yield 

(6.9) A 3 (t) = o„(l) 

uniformly with respect to t G [0, to]- Finally we use the estimate (14.111) and obtain the remaining 
terms in (16.11) 

|At(t)| < -= ^iY.Wi^im^) - m h (t,)) 3 | 

v n i=l j=l 

< sup \fn h (t) -m(t)| 3 • — \Zi\ y^Wijlej] 
*e[o,*o] V n i=1 j=1 

( 37 „,„ 27+2 \ / 2-4 7 \ 

= O p [n 2^+1 (logn)^ n 4 ^+ 2 J = O p in^+ 2 \ogn j = o p (l) 
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and 



1 - 

\A b (t)\ < sup \m h (t) - m(t)\ A —=y^ \Zi 
te[0M y/n ^ 



O p (n 2 *+! y/n(\ogn) 2 ^j = O p (n^+ 3 logrij = o p (l) 



uniformly in t G [0, £q]. Combining these estimates with (j6.4p . (16. 8p and (16.91) it follows that 
A(t) = o p (l) holds uniformly with respect to t 6 [0, to]; which proves (14. 3p . 
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